Search CORE

2,314 research outputs found

Three machine learning models for the 2019 Solubility Challenge

Author: Mitchell John B. O.
Publication venue: 'International Association of Physical Chemists (IAPC)'
Publication date: 01/01/2020
Field of study

We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.Publisher PDFPeer reviewe

PubMed Central

University of St. Andrews - Pure

HRČAK - Portal of Croatian Scientific and Professional Journals

St Andrews Research Repository

Hrčak - Portal of scientific journals of Croatia

We are probably not Sims

Author: Mitchell John B. O.
Publication venue
Publication date: 01/04/2020
Field of study

In this article, I discuss the current state of the debate around the simulation hypothesis, the idea that the world we inhabit is a computer simulation in or within another universe. Considering recent work from a range of authors, I suggest that statistical arguments in favour of a simulated world are naive and fail to account either for Ockham’s Razor or for alternative existential possibilities besides base reality and a simulation. Most significantly, I observe that it would be computationally impossible in our own universe to simulate a similar cosmos at fine granularity. This implies substantial differences in size and information content between simulating and simulated universes. I argue that this makes serious analysis of the simulation argument extremely difficult. I suggest that Christian theology has no reason to reinvent itself to accommodate simulism; the two should be viewed as mutually exclusive world-views. Further, I note that the existence of a human soul or spirit, or indeed any non-reductionist explanation of human consciousness, could undermine the assumption of substrate independence that simulism requires.PreprintPostprintPeer reviewe

University of St. Andrews - Pure

St Andrews Research Repository

Chemistry in Bioinformatics

Author: Mitchell John B O
Murray-Rust Peter
Rzepa Henry S
Publication venue
Publication date: 19/05/2005
Field of study

A preprint of an invited submission to BioMedCentral Bioinformatics. This short manuscript is an overview or the current problems and opportunities in publishing chemical information. Full details of technology are given in the sibling manuscript http://www.dspace.cam.ac.uk/handle/1810/34579 The manuscript is the authors' preprint although it has been automatically transformed into this archived PDF by the submission system. The authors are not responsible for the formattingChemical information is now seen as critical for most areas of life sciences. But unlike Bioinformatics, where data is Openly available and freely re−usable, most chemical information is closed and cannot be re−distributed without permission. This has led to a failure to adopt modern informatics and software techniques and therefore paucity of chemistry in bioinformatics. New technology, however, offers the hope of making chemical data (compounds and properties) Free during the authoring process. We argue that the technology is already available; we require a collective agreement to enhance publication protocols

PubMed Central

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

University of St. Andrews - Pure

Verifying the fully “Laplacianised” posterior Naïve Bayesian approach and more

Author: Glen Robert
Marcus David
Mitchell John B. O.
Mussa Hamse Yussuf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/05/2015
Field of study

Mussa and Glen would like to thank Unilever for financial support, whereas Mussa and Mitchell thank the BBSRC for funding this research through grant BB/I00596X/1. Mitchell thanks the Scottish Universities Life Sciences Alliance (SULSA) for financial support.Background In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the “Laplacian Corrected Modified Naïve Bayes” (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG’s work and introduces a new version of the SNB classifier: “Tapered Naïve Bayes” (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. Results LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the “optimal” number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the “optimal” number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the “optimal” number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. Conclusions The classification results obtained in this study concur with the mathematical based guidelines given in MMG’s paper—that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.Publisher PDFPeer reviewe

PubMed Central

Spiral - Imperial College Digital Repository

University of St. Andrews - Pure

St Andrews Research Repository

Enzyme function and its evolution

Author: Mitchell John B. O.
Publication venue: 'Elsevier BV'
Publication date: 01/12/2017
Field of study

With rapid increases over recent years in the determination of protein sequence and structure, alongside knowledge of thousands of enzyme functions and hundreds of chemical mechanisms, it is now possible to combine breadth and depth in our understanding of enzyme evolution. Phylogenetics continues to move forward, though determining correct evolutionary family trees is not trivial. Protein function prediction has spawned a variety of promising methods that offer the prospect of identifying enzymes across the whole range of chemical functions and over numerous species. This knowledge is essential to understand antibiotic resistance, as well as in protein re-engineering and de novo enzyme design.PostprintPeer reviewe

University of St. Andrews - Pure

St Andrews Research Repository

Predicting melting points of organic molecules : applications to aqueous solubility prediction using the General Solubility Equation

Author: McDonagh James
Mitchell John B. O.
van Mourik Tanja
Publication venue: 'Wiley'
Publication date: 15/07/2015
Field of study

In this work we make predictions of several important molecular properties of academic and industrial importance to seek answers to two questions: 1) Can we apply efficient machine learning techniques, using inexpensive descriptors, to predict melting points to a reasonable level of accuracy? 2) Can values of this level of accuracy be usefully applied to predicting aqueous solubility? We present predictions of melting points made by several novel machine learning models, previously applied to solubility prediction. Additionally, we make predictions of solubility via the General Solubility Equation (GSE) and monitor the impact of varying the logP prediction model (AlogP and XlogP) on the GSE. We note that the machine learning models presented, using a modest number of 2D descriptors, can make melting point predictions in line with the current state of the art prediction methods (RMSE ≥ 40 oC). We also find that predicted melting points, with an RMSE of tens of degrees Celsius, can be usefully applied to the GSE to yield accurate solubility predictions (log10S RMSE < 1) over a small dataset of druglike molecules.PostprintPostprintPeer reviewe

The University of Manchester - Institutional Repository

University of St. Andrews - Pure

St Andrews Research Repository

Computational insights into the catalytic mechanism of Is-PETase : an enzyme capable of degrading poly(ethylene) terephthalate

Author: Buehl Michael
Mitchell John B. O.
Shrimpton-Phoenix Eugene
Publication venue: 'Wiley'
Publication date: 25/10/2022
Field of study

This work was supported through a studentship from BBSRC in the EastBio doctoral training programme for E. S.-P.Is-PETase has become an enzyme of significant interest due to its ability to catalyse the degradation of polyethylene terephthalate (PET) at mesophilic temperatures. We performed hybrid quantum mechanics and molecular mechanics (QM/MM) at the DSD-PBEP86-D3/ma-def2-TZVP/CHARMM27//rev-PBE-D3/dev2-SVP/CHARMM level to calculate the energy profile for the degradation of a suitable PET model by this enzyme. Very low overall barriers are computed for serine protease-type hydrolysis steps (as low as 34.1 kJ mol-1). Spontaneous deprotonation of the final product, terephthalic acid, with a high computed driving force indicates that product release could be rate limiting.Publisher PDFPeer reviewe

University of St. Andrews - Pure

St Andrews Research Repository

Bony pelvis dimensions in women with and without stress urinary incontinence

Author: Berger Mitchell B.
DeLancey John O.
Doumouchtsis Stergios K.
Publication venue: 'Wiley'
Publication date: 01/01/2013
Field of study

Aims To test the null hypothesis that bony pelvis dimensions are similar in women with and without stress urinary incontinence (SUI), both in the postpartum and midlife periods. Methods Secondary analyses were performed of two case–control studies comparing women with SUI to asymptomatic controls. One study examined primiparas in the first 9–12 months postpartum; the other study involved middle‐aged women. SUI was confirmed by full‐bladder stress test. All subjects underwent pelvic magnetic resonance imaging. The interspinous and intertuberous diameters, subpubic angle, and sacrococcygeal joint‐to‐the inferior pubic point distance were measured from the images independently by two authors. Results In the young cohorts, we compared primiparas with de novo postpartum SUI to both continent primiparas and nulliparas. Postpartum SUI is associated with a wider subpubic angle. There is also a trend towards wider interspinous and intertuberous diameters in the stress‐incontinent primiparas as compared to the continent cohorts, although this did not reach statistical significance with our sample sizes. By contrast, no significant differences in bony pelvis dimensions were identified when comparing middle‐aged women with SUI and their continent controls. Conclusions Bony pelvis dimensions are different in women with SUI than in matched continent controls. However, these differences are only identified in young primiparas in the postpartum period, not in middle‐aged women. Neurourol. Urodynam. 32: 37–42, 2013. © 2012 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/95230/1/22275_ftp.pd

PubMed Central

Deep Blue Documents at the University of Michigan

A Bayesian network structure learning approach to identify genes associated with stress in spleens of chickens

Author: Mitchell John B. O.
Smith V. Anne
Videla Rodriguez Emiliano Ariel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/05/2022
Field of study

This work was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 812777Differences in the expression patterns of genes have been used to measure the effects of non-stress or stress conditions in poultry species. However, the list of genes identified can be extensive and they might be related to several biological systems. Therefore, the aim of this study was to identify a small set of genes closely associated with stress in a poultry animal model, the chicken (Gallus gallus), by reusing and combining data previously published together with bioinformatic analysis and Bayesian networks in a multi-step approach. Two datasets were collected from publicly available repositories and pre-processed. Bioinformatics analyses were performed to identify genes common to both datasets that showed differential expression patterns between non-stress and stress conditions. Bayesian networks were learnt using a Simulated Annealing algorithm implemented in the software Banjo. The structure of the Bayesian network consisted of 16 out of 19 genes together with the stress condition. Network structure showed CARD19 directly connected to the stress condition plus highlighted CYGB, BRAT1, and EPN3 as relevant, suggesting these genes could play a role in stress. The biological functionality of these genes is related to damage, apoptosis, and oxygen provision, and they could potentially be further explored as biomarkers of stress.Publisher PDFPeer reviewe

PubMed Central

University of St. Andrews - Pure

St Andrews Research Repository